Overview

This document presents comprehensive simulation results for all four endpoint types:

  • Time-to-Event (TTE): Survival analysis with hazard ratios
  • Binary: Logistic regression with odds ratios
  • Count: Negative binomial with rate ratios
  • Continuous: Linear regression with mean differences

The analysis includes:

  1. Standardized RMSE plots across all scenarios for each endpoint
  2. Performance tables (RMSE, Bias, Coverage) for all endpoints
  3. Additional TTE-specific visualizations


Analysis by Endpoint

## 
## === Truth Data Export Summary ===
## Subgroup coefficients:  600  rows
## Overall treatment effects:  24  rows
## Files saved in: simulations/Results/

Standardized RMSE Plots (All Endpoints Combined)

These plots show the Root Mean Squared Error (RMSE) standardized relative to the baseline “Subgroup (no shrinkage)” estimator across all six simulation scenarios.

Interpretation: Values below 1.0 indicate better performance than the baseline subgroup estimator. Lower values indicate better estimation accuracy.


RMSE Plots (All Endpoints Combined, Non-Standardized)

These plots show the Root Mean Squared Error (RMSE) in absolute terms across all six simulation scenarios.

Interpretation: Lower values indicate better estimation accuracy. Note that RMSE values are on different scales across endpoints due to different effect sizes and metrics (log hazard ratios for TTE, log odds ratios for Binary, log rate ratios for Count, and mean differences for Continuous).


Absolute Bias Plots (All Endpoints Combined)

These plots show the average absolute bias in treatment effect estimation across all six simulation scenarios.

Interpretation: Lower values indicate more accurate estimates. Absolute bias measures the magnitude of estimation error regardless of direction (over- or underestimation).


R2D2 Comparison: Unshrunk x4 Effect

This section compares the GM - R2D2 (Mid) estimator with its variant where x4 interactions are left unshrunk. The plot shows RMSE across scenarios to evaluate the impact of selective unshrinking.

R2D2 focus: RMSE/Bias/Coverage for subgroups 4a-4c (Scenarios 2 & 3)
Endpoint Scenario Subgroup Estimator RMSE Bias Coverage
Binary 2 4a GM - R2D2 (Mid) 0.381 -0.326 0.654
Binary 2 4a GM - R2D2 (Mid, unshrunk×4) 0.263 -0.007 0.941
Binary 2 4a Subgroup (no shrinkage) 0.259 -0.006 0.952
Binary 2 4b GM - R2D2 (Mid) 0.222 0.144 0.910
Binary 2 4b GM - R2D2 (Mid, unshrunk×4) 0.273 -0.013 0.944
Binary 2 4b Subgroup (no shrinkage) 0.271 -0.009 0.945
Binary 2 4c GM - R2D2 (Mid) 0.215 0.135 0.905
Binary 2 4c GM - R2D2 (Mid, unshrunk×4) 0.238 -0.009 0.947
Binary 2 4c Subgroup (no shrinkage) 0.235 -0.007 0.956
Binary 3 4a GM - R2D2 (Mid) 0.480 0.389 0.643
Binary 3 4a GM - R2D2 (Mid, unshrunk×4) 0.275 0.005 0.937
Binary 3 4a Subgroup (no shrinkage) 0.273 0.008 0.948
Binary 3 4b GM - R2D2 (Mid) 0.258 -0.166 0.850
Binary 3 4b GM - R2D2 (Mid, unshrunk×4) 0.281 -0.002 0.933
Binary 3 4b Subgroup (no shrinkage) 0.280 -0.003 0.932
Binary 3 4c GM - R2D2 (Mid) 0.244 -0.154 0.840
Binary 3 4c GM - R2D2 (Mid, unshrunk×4) 0.231 0.000 0.941
Binary 3 4c Subgroup (no shrinkage) 0.228 0.001 0.947
Continuous 2 4a GM - R2D2 (Mid) 0.298 -0.236 0.798
Continuous 2 4a GM - R2D2 (Mid, unshrunk×4) 0.276 0.018 0.947
Continuous 2 4a Subgroup (no shrinkage) 0.275 0.018 0.946
Continuous 2 4b GM - R2D2 (Mid) 0.204 0.117 0.931
Continuous 2 4b GM - R2D2 (Mid, unshrunk×4) 0.274 -0.002 0.945
Continuous 2 4b Subgroup (no shrinkage) 0.273 -0.002 0.949
Continuous 2 4c GM - R2D2 (Mid) 0.201 0.110 0.929
Continuous 2 4c GM - R2D2 (Mid, unshrunk×4) 0.244 0.009 0.955
Continuous 2 4c Subgroup (no shrinkage) 0.243 0.008 0.953
Continuous 3 4a GM - R2D2 (Mid) 0.403 0.343 0.655
Continuous 3 4a GM - R2D2 (Mid, unshrunk×4) 0.277 0.020 0.947
Continuous 3 4a Subgroup (no shrinkage) 0.275 0.018 0.946
Continuous 3 4b GM - R2D2 (Mid) 0.225 -0.140 0.898
Continuous 3 4b GM - R2D2 (Mid, unshrunk×4) 0.275 -0.004 0.947
Continuous 3 4b Subgroup (no shrinkage) 0.273 -0.002 0.949
Continuous 3 4c GM - R2D2 (Mid) 0.220 -0.130 0.900
Continuous 3 4c GM - R2D2 (Mid, unshrunk×4) 0.244 0.008 0.953
Continuous 3 4c Subgroup (no shrinkage) 0.243 0.008 0.953
Count 2 4a GM - R2D2 (Mid) 0.381 -0.328 0.684
Count 2 4a GM - R2D2 (Mid, unshrunk×4) 0.262 0.006 0.959
Count 2 4a Subgroup (no shrinkage) 0.259 0.000 0.946
Count 2 4b GM - R2D2 (Mid) 0.235 0.155 0.908
Count 2 4b GM - R2D2 (Mid, unshrunk×4) 0.283 -0.017 0.957
Count 2 4b Subgroup (no shrinkage) 0.275 -0.009 0.944
Count 2 4c GM - R2D2 (Mid) 0.225 0.149 0.911
Count 2 4c GM - R2D2 (Mid, unshrunk×4) 0.252 -0.013 0.955
Count 2 4c Subgroup (no shrinkage) 0.246 -0.001 0.941
Count 3 4a GM - R2D2 (Mid) 0.500 0.424 0.657
Count 3 4a GM - R2D2 (Mid, unshrunk×4) 0.277 -0.010 0.954
Count 3 4a Subgroup (no shrinkage) 0.269 0.000 0.949
Count 3 4b GM - R2D2 (Mid) 0.257 -0.176 0.875
Count 3 4b GM - R2D2 (Mid, unshrunk×4) 0.272 0.001 0.942
Count 3 4b Subgroup (no shrinkage) 0.266 -0.003 0.935
Count 3 4c GM - R2D2 (Mid) 0.253 -0.174 0.866
Count 3 4c GM - R2D2 (Mid, unshrunk×4) 0.239 -0.008 0.947
Count 3 4c Subgroup (no shrinkage) 0.237 -0.012 0.937
Time-to-Event (TTE) 2 4a GM - R2D2 (Mid) 0.301 -0.235 0.770
Time-to-Event (TTE) 2 4a GM - R2D2 (Mid, unshrunk×4) 0.211 0.006 0.950
Time-to-Event (TTE) 2 4a Subgroup (no shrinkage) 0.203 0.003 0.954
Time-to-Event (TTE) 2 4b GM - R2D2 (Mid) 0.232 0.154 0.893
Time-to-Event (TTE) 2 4b GM - R2D2 (Mid, unshrunk×4) 0.258 -0.008 0.939
Time-to-Event (TTE) 2 4b Subgroup (no shrinkage) 0.248 0.009 0.944
Time-to-Event (TTE) 2 4c GM - R2D2 (Mid) 0.226 0.148 0.876
Time-to-Event (TTE) 2 4c GM - R2D2 (Mid, unshrunk×4) 0.253 -0.023 0.949
Time-to-Event (TTE) 2 4c Subgroup (no shrinkage) 0.248 -0.009 0.953
Time-to-Event (TTE) 3 4a GM - R2D2 (Mid) 0.459 0.363 0.704
Time-to-Event (TTE) 3 4a GM - R2D2 (Mid, unshrunk×4) 0.279 -0.023 0.940
Time-to-Event (TTE) 3 4a Subgroup (no shrinkage) 0.271 -0.004 0.944
Time-to-Event (TTE) 3 4b GM - R2D2 (Mid) 0.209 -0.125 0.909
Time-to-Event (TTE) 3 4b GM - R2D2 (Mid, unshrunk×4) 0.219 0.005 0.943
Time-to-Event (TTE) 3 4b Subgroup (no shrinkage) 0.211 -0.001 0.945
Time-to-Event (TTE) 3 4c GM - R2D2 (Mid) 0.204 -0.119 0.922
Time-to-Event (TTE) 3 4c GM - R2D2 (Mid, unshrunk×4) 0.217 0.010 0.950
Time-to-Event (TTE) 3 4c Subgroup (no shrinkage) 0.211 0.003 0.963

Interpretation: - Subgroup (orange): Baseline with no shrinkage, highest RMSE due to noise - GM - R2D2 Mid (medium green): Standard R2D2 with shrinkage on all interactions - GM - R2D2 Mid, unshrunk×4 (dark green): Selective unshrinking of x4 interactions to preserve heterogeneity signal

Compare performance in Scenario 2 (heterogeneous in x4) vs other scenarios to evaluate the benefit of selective unshrinking.

Forest Plot: Scenario 3 - R2D2 Comparison

Scenario 3 has heterogeneity in x4 subgroups (negative treatment effect except for one subgroup). This forest plot compares the three estimators in capturing this heterogeneous pattern.

Interpretation: - Subgroup (orange): No shrinkage, captures heterogeneity but with wide intervals - GM - R2D2 Mid (medium green): Applies shrinkage to all interactions, may partially shrink the signal - GM - R2D2 Mid, unshrunk×4 (dark green): Preserves x4 heterogeneity by not shrinking x4 interactions

The plot shows whether unshrinking x4 interactions helps preserve the heterogeneous treatment effect while maintaining reasonable interval widths.


Standardized RMSE Plots by Endpoint

Individual plots for each endpoint with enhanced visibility.

Time-to-Event (TTE) Endpoint


Binary Endpoint


Count Endpoint


Continuous Endpoint

Analysis by Endpoint


Standardized RMSE Plots (All Endpoints)

These plots show the Root Mean Squared Error (RMSE) standardized relative to the baseline “subgroup (no shrinkage)” estimator across all six simulation scenarios.

Interpretation: Values below 1.0 indicate better performance than the baseline subgroup estimator. Lower values indicate better estimation accuracy.


Coverage Issues in Scenario 2 - Forest Plots

This section illustrates the undercoverage problem in heterogeneous subgroups (Scenario 2). The plots show average treatment effect estimates with 95% credible intervals for subgroups 4aa and 4ab (where x4=a), which deviate from the overall positive treatment effect. The black ‘X’ marks the true effect.

Time-to-Event (TTE) Endpoint

Binary Endpoint

Count Endpoint

Continuous Endpoint

Interpretation:

  • Population estimator (purple): Completely shrinks estimates toward the population mean, missing the heterogeneity signal. Credible intervals are narrow but fail to cover the truth (extreme undercoverage ~54%).

  • Subgroup estimator (orange): Correctly captures the heterogeneity with estimates close to the truth. Maintains nominal coverage (~95%) but has wider intervals.

  • GM - R2D2 Mid (green): Applies partial shrinkage, improving precision while still capturing most of the heterogeneity signal. Coverage drops to 80-90% in these outlier subgroups but avoids the extreme undercoverage of the population estimator.

  • OVAT - Hierarchical (red): Similar to subgroup estimator with nominal coverage, using hierarchical priors to stabilize estimates.


Heterogeneous Subgroup Identification (Scenario 2)

This section analyzes how well each estimator identifies the heterogeneous subgroup (x4) in Scenario 2, where one subgroup has a different treatment effect.

Time-to-Event (TTE) Endpoint

Heterogeneous Subgroup Identification - TTE Endpoint (Scenario 2)
Estimator Identification Rate Absolute Bias (x4a) Coverage (x4a)
Population (full shrinkage) 1.000 0.388 0.147
GM - Horseshoe (Low) 0.004 0.293 0.712
GM - Horseshoe (Mid) 0.004 0.328 0.607
GM - R2D2 (Strong) 0.004 0.264 0.775
GM - Horseshoe (Strong) 0.003 0.337 0.567
GM - R2D2 (Low) 0.002 0.260 0.760
GM - R2D2 (Mid) 0.002 0.260 0.770
OVAT - Hierarchical 0.002 0.178 0.928
Subgroup (no shrinkage) 0.001 0.159 0.954
Identification Rate: Proportion of replications where x4a subgroups (x4=a) had the most extreme estimate. Bias and coverage computed only for x4a subgroups. Based on 1000 replications.

Binary Endpoint

Heterogeneous Subgroup Identification - Binary Endpoint (Scenario 2)
Estimator Identification Rate Absolute Bias (x4a) Coverage (x4a)
Population (full shrinkage) 1.000 0.433 0.150
GM - Horseshoe (Mid) 0.006 0.394 0.503
GM - Horseshoe (Low) 0.004 0.362 0.639
GM - Horseshoe (Strong) 0.004 0.401 0.470
GM - R2D2 (Strong) 0.003 0.349 0.668
GM - R2D2 (Mid) 0.002 0.344 0.654
GM - R2D2 (Low) 0.001 0.343 0.640
OVAT - Hierarchical 0.001 0.228 0.917
Subgroup (no shrinkage) 0.000 0.206 0.952
Identification Rate: Proportion of replications where x4a subgroups (x4=a) had the most extreme estimate. Bias and coverage computed only for x4a subgroups. Based on 1000 replications.

Count Endpoint

Heterogeneous Subgroup Identification - Count Endpoint (Scenario 2)
Estimator Identification Rate Absolute Bias (x4a) Coverage (x4a)
Population (full shrinkage) 1.000 0.374 0.305
GM - Horseshoe (Low) 0.004 0.357 0.676
GM - Horseshoe (Mid) 0.004 0.391 0.554
GM - R2D2 (Strong) 0.004 0.346 0.699
GM - Horseshoe (Strong) 0.003 0.398 0.521
GM - R2D2 (Low) 0.003 0.342 0.672
GM - R2D2 (Mid) 0.003 0.342 0.684
OVAT - Hierarchical 0.002 0.224 0.928
OVAT - Hierarchical 0.002 0.223 0.927
Subgroup (no shrinkage) 0.001 0.207 0.946
Identification Rate: Proportion of replications where x4a subgroups (x4=a) had the most extreme estimate. Bias and coverage computed only for x4a subgroups. Based on 1000 replications.

Continuous Endpoint

Heterogeneous Subgroup Identification - Continuous Endpoint (Scenario 2)
Estimator Identification Rate Absolute Bias (x4a) Coverage (x4a)
Population (full shrinkage) 1.000 0.315 0.451
Subgroup (no shrinkage) 0.019 0.222 0.946
OVAT - Hierarchical 0.012 0.210 0.944
GM - Horseshoe (Low) 0.010 0.269 0.786
GM - R2D2 (Strong) 0.010 0.264 0.800
GM - Horseshoe (Mid) 0.009 0.288 0.711
GM - Horseshoe (Strong) 0.008 0.293 0.680
GM - R2D2 (Low) 0.008 0.257 0.802
GM - R2D2 (Mid) 0.008 0.258 0.798
Identification Rate: Proportion of replications where x4a subgroups (x4=a) had the most extreme estimate. Bias and coverage computed only for x4a subgroups. Based on 1000 replications.

Performance Tables (All Endpoints)

The following tables present RMSE, absolute bias, and coverage of 95% confidence/credible intervals for all estimators, stratified by subgroup heterogeneity.

Time-to-Event (TTE) Endpoint

## 
## ### Focus: Subgroups 4a/4b/4c — Scenarios 2 & 3 (check unshrunk x4)
## 
## 
## 
## Table: Focused metrics - Time-to-Event (TTE)
## 
## |Scenario |Subgroup |Estimator                   |  RMSE|   Bias| Coverage|
## |:--------|:--------|:---------------------------|-----:|------:|--------:|
## |2        |4a       |GM - R2D2 (Mid)             | 0.301| -0.235|    0.770|
## |2        |4a       |GM - R2D2 (Mid, unshrunk×4) | 0.211|  0.006|    0.950|
## |2        |4a       |Subgroup (no shrinkage)     | 0.203|  0.003|    0.954|
## |2        |4b       |GM - R2D2 (Mid)             | 0.232|  0.154|    0.893|
## |2        |4b       |GM - R2D2 (Mid, unshrunk×4) | 0.258| -0.008|    0.939|
## |2        |4b       |Subgroup (no shrinkage)     | 0.248|  0.009|    0.944|
## |2        |4c       |GM - R2D2 (Mid)             | 0.226|  0.148|    0.876|
## |2        |4c       |GM - R2D2 (Mid, unshrunk×4) | 0.253| -0.023|    0.949|
## |2        |4c       |Subgroup (no shrinkage)     | 0.248| -0.009|    0.953|
## |3        |4a       |GM - R2D2 (Mid)             | 0.459|  0.363|    0.704|
## |3        |4a       |GM - R2D2 (Mid, unshrunk×4) | 0.279| -0.023|    0.940|
## |3        |4a       |Subgroup (no shrinkage)     | 0.271| -0.004|    0.944|
## |3        |4b       |GM - R2D2 (Mid)             | 0.209| -0.125|    0.909|
## |3        |4b       |GM - R2D2 (Mid, unshrunk×4) | 0.219|  0.005|    0.943|
## |3        |4b       |Subgroup (no shrinkage)     | 0.211| -0.001|    0.945|
## |3        |4c       |GM - R2D2 (Mid)             | 0.204| -0.119|    0.922|
## |3        |4c       |GM - R2D2 (Mid, unshrunk×4) | 0.217|  0.010|    0.950|
## |3        |4c       |Subgroup (no shrinkage)     | 0.211|  0.003|    0.963|
Performance Metrics for Time-to-Event (TTE) Endpoint
Criterion All subgroups Homogeneous Heterogeneous
RMSE
GM - Horseshoe (Low) 0.16 (0.13-0.49) 0.14 (0.13-0.17) 0.25 (0.17-0.49)
GM - Horseshoe (Mid) 0.16 (0.13-0.55) 0.14 (0.13-0.17) 0.26 (0.17-0.55)
GM - Horseshoe (Strong) 0.16 (0.13-0.56) 0.14 (0.13-0.17) 0.26 (0.17-0.56)
GM - R2D2 (Low) 0.16 (0.13-0.47) 0.15 (0.13-0.20) 0.23 (0.16-0.47)
GM - R2D2 (Mid) 0.16 (0.13-0.46) 0.15 (0.13-0.19) 0.23 (0.16-0.46)
GM - R2D2 (Strong) 0.16 (0.13-0.45) 0.14 (0.13-0.17) 0.24 (0.17-0.45)
OVAT - Hierarchical 0.19 (0.14-0.33) 0.18 (0.14-0.24) 0.22 (0.15-0.33)
Population (full shrinkage) 0.16 (0.13-0.68) 0.13 (0.13-0.16) 0.29 (0.17-0.68)
Subgroup (no shrinkage) 0.23 (0.14-0.36) 0.22 (0.14-0.36) 0.24 (0.15-0.34)
Absolute bias
GM - Horseshoe (Low) 0.04 (0.00-0.36) 0.02 (0.00-0.08) 0.15 (0.05-0.36)
GM - Horseshoe (Mid) 0.05 (0.00-0.41) 0.02 (0.00-0.09) 0.18 (0.06-0.41)
GM - Horseshoe (Strong) 0.05 (0.00-0.43) 0.02 (0.00-0.09) 0.18 (0.07-0.43)
GM - R2D2 (Low) 0.04 (0.00-0.38) 0.02 (0.00-0.07) 0.14 (0.03-0.38)
GM - R2D2 (Mid) 0.04 (0.00-0.36) 0.02 (0.00-0.08) 0.14 (0.03-0.36)
GM - R2D2 (Strong) 0.04 (0.00-0.36) 0.02 (0.00-0.08) 0.14 (0.04-0.36)
OVAT - Hierarchical 0.02 (0.00-0.18) 0.01 (0.00-0.05) 0.06 (0.02-0.18)
Population (full shrinkage) 0.06 (0.00-0.67) 0.01 (0.00-0.09) 0.25 (0.11-0.67)
Subgroup (no shrinkage) 0.01 (0.00-0.02) 0.01 (0.00-0.02) 0.01 (0.00-0.02)
Coverage of 95% CI
GM - Horseshoe (Low) 94% (71%-98%) 96% (92%-98%) 86% (71%-95%)
GM - Horseshoe (Mid) 92% (59%-97%) 95% (91%-97%) 81% (59%-92%)
GM - Horseshoe (Strong) 92% (56%-97%) 95% (91%-97%) 79% (56%-92%)
GM - R2D2 (Low) 95% (66%-99%) 97% (92%-99%) 89% (66%-97%)
GM - R2D2 (Mid) 95% (70%-99%) 97% (92%-99%) 89% (70%-96%)
GM - R2D2 (Strong) 95% (74%-99%) 96% (91%-99%) 88% (74%-95%)
OVAT - Hierarchical 96% (89%-99%) 96% (94%-99%) 94% (89%-98%)
Population (full shrinkage) 87% (0%-96%) 95% (88%-96%) 54% (0%-86%)
Subgroup (no shrinkage) 95% (93%-97%) 95% (93%-97%) 95% (94%-97%)
Summary across 150 subgroups ( 25 subgroups × 6 scenarios) including 150 homogeneous and 0 heterogeneous subgroups.
Monte Carlo SE of estimated coverage ≈ 0.7% (1000 simulations, 95% true coverage).

Binary Endpoint

## 
## ### Focus: Subgroups 4a/4b/4c — Scenarios 2 & 3 (check unshrunk x4)
## 
## 
## 
## Table: Focused metrics - Binary
## 
## |Scenario |Subgroup |Estimator                   |  RMSE|   Bias| Coverage|
## |:--------|:--------|:---------------------------|-----:|------:|--------:|
## |2        |4a       |GM - R2D2 (Mid)             | 0.381| -0.326|    0.654|
## |2        |4a       |GM - R2D2 (Mid, unshrunk×4) | 0.263| -0.007|    0.941|
## |2        |4a       |Subgroup (no shrinkage)     | 0.259| -0.006|    0.952|
## |2        |4b       |GM - R2D2 (Mid)             | 0.222|  0.144|    0.910|
## |2        |4b       |GM - R2D2 (Mid, unshrunk×4) | 0.273| -0.013|    0.944|
## |2        |4b       |Subgroup (no shrinkage)     | 0.271| -0.009|    0.945|
## |2        |4c       |GM - R2D2 (Mid)             | 0.215|  0.135|    0.905|
## |2        |4c       |GM - R2D2 (Mid, unshrunk×4) | 0.238| -0.009|    0.947|
## |2        |4c       |Subgroup (no shrinkage)     | 0.235| -0.007|    0.956|
## |3        |4a       |GM - R2D2 (Mid)             | 0.480|  0.389|    0.643|
## |3        |4a       |GM - R2D2 (Mid, unshrunk×4) | 0.275|  0.005|    0.937|
## |3        |4a       |Subgroup (no shrinkage)     | 0.273|  0.008|    0.948|
## |3        |4b       |GM - R2D2 (Mid)             | 0.258| -0.166|    0.850|
## |3        |4b       |GM - R2D2 (Mid, unshrunk×4) | 0.281| -0.002|    0.933|
## |3        |4b       |Subgroup (no shrinkage)     | 0.280| -0.003|    0.932|
## |3        |4c       |GM - R2D2 (Mid)             | 0.244| -0.154|    0.840|
## |3        |4c       |GM - R2D2 (Mid, unshrunk×4) | 0.231|  0.000|    0.941|
## |3        |4c       |Subgroup (no shrinkage)     | 0.228|  0.001|    0.947|
Performance Metrics for Binary Endpoint
Criterion All subgroups Homogeneous Heterogeneous
RMSE
GM - Horseshoe (Low) 0.17 (0.14-0.63) 0.15 (0.14-0.18) 0.26 (0.16-0.63)
GM - Horseshoe (Mid) 0.17 (0.14-0.68) 0.15 (0.14-0.18) 0.27 (0.17-0.68)
GM - Horseshoe (Strong) 0.17 (0.14-0.69) 0.15 (0.14-0.18) 0.28 (0.17-0.69)
GM - R2D2 (Low) 0.17 (0.14-0.59) 0.15 (0.14-0.19) 0.25 (0.16-0.59)
GM - R2D2 (Mid) 0.17 (0.14-0.59) 0.15 (0.14-0.19) 0.25 (0.16-0.59)
GM - R2D2 (Strong) 0.17 (0.14-0.61) 0.15 (0.14-0.18) 0.26 (0.16-0.61)
OVAT - Hierarchical 0.20 (0.15-0.42) 0.19 (0.15-0.26) 0.24 (0.16-0.42)
Population (full shrinkage) 0.17 (0.14-0.76) 0.15 (0.14-0.17) 0.29 (0.18-0.76)
Subgroup (no shrinkage) 0.25 (0.16-0.39) 0.25 (0.16-0.39) 0.26 (0.16-0.39)
Absolute bias
GM - Horseshoe (Low) 0.04 (0.00-0.53) 0.01 (0.00-0.08) 0.18 (0.07-0.53)
GM - Horseshoe (Mid) 0.05 (0.00-0.60) 0.01 (0.00-0.08) 0.20 (0.08-0.60)
GM - Horseshoe (Strong) 0.05 (0.00-0.62) 0.01 (0.00-0.08) 0.20 (0.08-0.62)
GM - R2D2 (Low) 0.04 (0.00-0.52) 0.01 (0.00-0.07) 0.17 (0.06-0.52)
GM - R2D2 (Mid) 0.04 (0.00-0.52) 0.01 (0.00-0.07) 0.17 (0.06-0.52)
GM - R2D2 (Strong) 0.04 (0.00-0.52) 0.01 (0.00-0.07) 0.17 (0.06-0.52)
OVAT - Hierarchical 0.02 (0.00-0.26) 0.01 (0.00-0.04) 0.07 (0.02-0.26)
Population (full shrinkage) 0.06 (0.00-0.75) 0.01 (0.00-0.09) 0.25 (0.10-0.75)
Subgroup (no shrinkage) 0.01 (0.00-0.03) 0.01 (0.00-0.03) 0.01 (0.00-0.02)
Coverage of 95% CI
GM - Horseshoe (Low) 94% (59%-98%) 96% (93%-98%) 84% (59%-96%)
GM - Horseshoe (Mid) 92% (45%-97%) 95% (92%-97%) 78% (45%-93%)
GM - Horseshoe (Strong) 92% (40%-97%) 95% (91%-97%) 77% (40%-93%)
GM - R2D2 (Low) 94% (59%-99%) 96% (94%-99%) 86% (59%-97%)
GM - R2D2 (Mid) 94% (61%-99%) 96% (93%-99%) 86% (61%-96%)
GM - R2D2 (Strong) 94% (63%-99%) 96% (93%-99%) 85% (63%-96%)
OVAT - Hierarchical 96% (91%-99%) 97% (94%-99%) 95% (91%-98%)
Population (full shrinkage) 88% (0%-96%) 95% (89%-96%) 61% (0%-89%)
Subgroup (no shrinkage) 95% (93%-97%) 95% (93%-97%) 95% (93%-96%)
Summary across 150 subgroups ( 25 subgroups × 6 scenarios) including 150 homogeneous and 0 heterogeneous subgroups.
Monte Carlo SE of estimated coverage ≈ 0.7% (1000 simulations, 95% true coverage).

Count Endpoint

## 
## ### Focus: Subgroups 4a/4b/4c — Scenarios 2 & 3 (check unshrunk x4)
## 
## 
## 
## Table: Focused metrics - Count
## 
## |Scenario |Subgroup |Estimator                   |  RMSE|   Bias| Coverage|
## |:--------|:--------|:---------------------------|-----:|------:|--------:|
## |2        |4a       |GM - R2D2 (Mid)             | 0.381| -0.328|    0.684|
## |2        |4a       |GM - R2D2 (Mid, unshrunk×4) | 0.262|  0.006|    0.959|
## |2        |4a       |Subgroup (no shrinkage)     | 0.259|  0.000|    0.946|
## |2        |4b       |GM - R2D2 (Mid)             | 0.235|  0.155|    0.908|
## |2        |4b       |GM - R2D2 (Mid, unshrunk×4) | 0.283| -0.017|    0.957|
## |2        |4b       |Subgroup (no shrinkage)     | 0.275| -0.009|    0.944|
## |2        |4c       |GM - R2D2 (Mid)             | 0.225|  0.149|    0.911|
## |2        |4c       |GM - R2D2 (Mid, unshrunk×4) | 0.252| -0.013|    0.955|
## |2        |4c       |Subgroup (no shrinkage)     | 0.246| -0.001|    0.941|
## |3        |4a       |GM - R2D2 (Mid)             | 0.500|  0.424|    0.657|
## |3        |4a       |GM - R2D2 (Mid, unshrunk×4) | 0.277| -0.010|    0.954|
## |3        |4a       |Subgroup (no shrinkage)     | 0.269|  0.000|    0.949|
## |3        |4b       |GM - R2D2 (Mid)             | 0.257| -0.176|    0.875|
## |3        |4b       |GM - R2D2 (Mid, unshrunk×4) | 0.272|  0.001|    0.942|
## |3        |4b       |Subgroup (no shrinkage)     | 0.266| -0.003|    0.935|
## |3        |4c       |GM - R2D2 (Mid)             | 0.253| -0.174|    0.866|
## |3        |4c       |GM - R2D2 (Mid, unshrunk×4) | 0.239| -0.008|    0.947|
## |3        |4c       |Subgroup (no shrinkage)     | 0.237| -0.012|    0.937|
Performance Metrics for Count Endpoint
Criterion All subgroups Homogeneous Heterogeneous
RMSE
GM - Horseshoe (Low) 0.18 (0.15-0.64) 0.16 (0.15-0.23) 0.27 (0.17-0.64)
GM - Horseshoe (Mid) 0.19 (0.15-0.68) 0.16 (0.15-0.23) 0.28 (0.16-0.68)
GM - Horseshoe (Strong) 0.19 (0.15-0.69) 0.16 (0.15-0.23) 0.28 (0.16-0.69)
GM - R2D2 (Low) 0.18 (0.15-0.59) 0.17 (0.15-0.23) 0.26 (0.17-0.59)
GM - R2D2 (Mid) 0.18 (0.15-0.60) 0.17 (0.15-0.23) 0.26 (0.17-0.60)
GM - R2D2 (Strong) 0.18 (0.15-0.62) 0.16 (0.15-0.23) 0.27 (0.17-0.62)
OVAT - Hierarchical 0.21 (0.16-0.47) 0.20 (0.16-0.27) 0.25 (0.18-0.47)
Population (full shrinkage) 0.18 (0.15-0.87) 0.16 (0.15-0.18) 0.31 (0.19-0.87)
Subgroup (no shrinkage) 0.26 (0.16-0.44) 0.26 (0.16-0.42) 0.28 (0.20-0.44)
Absolute bias
GM - Horseshoe (Low) 0.06 (0.00-0.51) 0.03 (0.00-0.15) 0.18 (0.02-0.51)
GM - Horseshoe (Mid) 0.07 (0.00-0.58) 0.04 (0.00-0.17) 0.21 (0.01-0.58)
GM - Horseshoe (Strong) 0.07 (0.00-0.60) 0.04 (0.00-0.17) 0.21 (0.01-0.60)
GM - R2D2 (Low) 0.06 (0.00-0.51) 0.03 (0.00-0.13) 0.18 (0.01-0.51)
GM - R2D2 (Mid) 0.06 (0.00-0.51) 0.03 (0.00-0.13) 0.18 (0.01-0.51)
GM - R2D2 (Strong) 0.06 (0.00-0.51) 0.03 (0.00-0.14) 0.18 (0.02-0.51)
OVAT - Hierarchical 0.02 (0.00-0.26) 0.01 (0.00-0.05) 0.07 (0.02-0.26)
Population (full shrinkage) 0.06 (0.00-0.86) 0.02 (0.00-0.09) 0.25 (0.10-0.86)
Subgroup (no shrinkage) 0.01 (0.00-0.03) 0.01 (0.00-0.03) 0.01 (0.00-0.02)
Coverage of 95% CI
GM - Horseshoe (Low) 94% (65%-98%) 96% (87%-98%) 86% (65%-97%)
GM - Horseshoe (Mid) 92% (52%-97%) 95% (82%-97%) 81% (52%-97%)
GM - Horseshoe (Strong) 92% (49%-97%) 95% (82%-97%) 79% (49%-96%)
GM - R2D2 (Low) 95% (63%-99%) 97% (88%-99%) 88% (63%-98%)
GM - R2D2 (Mid) 95% (66%-99%) 96% (88%-99%) 88% (66%-98%)
GM - R2D2 (Strong) 95% (67%-99%) 96% (87%-99%) 87% (67%-97%)
OVAT - Hierarchical 96% (87%-99%) 96% (93%-99%) 94% (87%-98%)
Population (full shrinkage) 88% (0%-95%) 94% (88%-95%) 61% (0%-87%)
Subgroup (no shrinkage) 94% (92%-96%) 94% (92%-96%) 93% (92%-95%)
Summary across 150 subgroups ( 25 subgroups × 6 scenarios) including 150 homogeneous and 0 heterogeneous subgroups.
Monte Carlo SE of estimated coverage ≈ 0.7% (1000 simulations, 95% true coverage).

Continuous Endpoint

## 
## ### Focus: Subgroups 4a/4b/4c — Scenarios 2 & 3 (check unshrunk x4)
## 
## 
## 
## Table: Focused metrics - Continuous
## 
## |Scenario |Subgroup |Estimator                   |  RMSE|   Bias| Coverage|
## |:--------|:--------|:---------------------------|-----:|------:|--------:|
## |2        |4a       |GM - R2D2 (Mid)             | 0.298| -0.236|    0.798|
## |2        |4a       |GM - R2D2 (Mid, unshrunk×4) | 0.276|  0.018|    0.947|
## |2        |4a       |Subgroup (no shrinkage)     | 0.275|  0.018|    0.946|
## |2        |4b       |GM - R2D2 (Mid)             | 0.204|  0.117|    0.931|
## |2        |4b       |GM - R2D2 (Mid, unshrunk×4) | 0.274| -0.002|    0.945|
## |2        |4b       |Subgroup (no shrinkage)     | 0.273| -0.002|    0.949|
## |2        |4c       |GM - R2D2 (Mid)             | 0.201|  0.110|    0.929|
## |2        |4c       |GM - R2D2 (Mid, unshrunk×4) | 0.244|  0.009|    0.955|
## |2        |4c       |Subgroup (no shrinkage)     | 0.243|  0.008|    0.953|
## |3        |4a       |GM - R2D2 (Mid)             | 0.403|  0.343|    0.655|
## |3        |4a       |GM - R2D2 (Mid, unshrunk×4) | 0.277|  0.020|    0.947|
## |3        |4a       |Subgroup (no shrinkage)     | 0.275|  0.018|    0.946|
## |3        |4b       |GM - R2D2 (Mid)             | 0.225| -0.140|    0.898|
## |3        |4b       |GM - R2D2 (Mid, unshrunk×4) | 0.275| -0.004|    0.947|
## |3        |4b       |Subgroup (no shrinkage)     | 0.273| -0.002|    0.949|
## |3        |4c       |GM - R2D2 (Mid)             | 0.220| -0.130|    0.900|
## |3        |4c       |GM - R2D2 (Mid, unshrunk×4) | 0.244|  0.008|    0.953|
## |3        |4c       |Subgroup (no shrinkage)     | 0.243|  0.008|    0.953|
Performance Metrics for Continuous Endpoint
Criterion All subgroups Homogeneous Heterogeneous
RMSE
GM - Horseshoe (Low) 0.18 (0.15-0.63) 0.16 (0.15-0.19) 0.26 (0.18-0.63)
GM - Horseshoe (Mid) 0.18 (0.15-0.69) 0.16 (0.15-0.19) 0.27 (0.18-0.69)
GM - Horseshoe (Strong) 0.18 (0.15-0.70) 0.16 (0.15-0.19) 0.27 (0.18-0.70)
GM - R2D2 (Low) 0.18 (0.15-0.59) 0.16 (0.15-0.20) 0.25 (0.18-0.59)
GM - R2D2 (Mid) 0.18 (0.15-0.60) 0.16 (0.15-0.20) 0.25 (0.18-0.60)
GM - R2D2 (Strong) 0.18 (0.15-0.62) 0.16 (0.15-0.19) 0.25 (0.18-0.62)
OVAT - Hierarchical 0.21 (0.16-0.46) 0.20 (0.16-0.26) 0.24 (0.17-0.46)
Population (full shrinkage) 0.18 (0.15-0.80) 0.15 (0.15-0.19) 0.28 (0.19-0.80)
Subgroup (no shrinkage) 0.26 (0.17-0.41) 0.26 (0.17-0.40) 0.27 (0.17-0.41)
Absolute bias
GM - Horseshoe (Low) 0.05 (0.00-0.52) 0.02 (0.00-0.08) 0.17 (0.07-0.52)
GM - Horseshoe (Mid) 0.05 (0.00-0.59) 0.02 (0.00-0.09) 0.19 (0.08-0.59)
GM - Horseshoe (Strong) 0.05 (0.00-0.61) 0.02 (0.00-0.09) 0.20 (0.09-0.61)
GM - R2D2 (Low) 0.04 (0.00-0.51) 0.02 (0.00-0.08) 0.16 (0.06-0.51)
GM - R2D2 (Mid) 0.04 (0.00-0.51) 0.02 (0.00-0.08) 0.16 (0.06-0.51)
GM - R2D2 (Strong) 0.04 (0.00-0.51) 0.02 (0.00-0.08) 0.16 (0.07-0.51)
OVAT - Hierarchical 0.02 (0.00-0.29) 0.01 (0.00-0.06) 0.08 (0.03-0.29)
Population (full shrinkage) 0.06 (0.00-0.78) 0.02 (0.00-0.11) 0.23 (0.10-0.78)
Subgroup (no shrinkage) 0.01 (0.00-0.02) 0.01 (0.00-0.02) 0.01 (0.00-0.02)
Coverage of 95% CI
GM - Horseshoe (Low) 95% (64%-98%) 96% (94%-98%) 88% (64%-96%)
GM - Horseshoe (Mid) 93% (52%-97%) 96% (93%-97%) 83% (52%-94%)
GM - Horseshoe (Strong) 93% (48%-97%) 95% (92%-97%) 82% (48%-94%)
GM - R2D2 (Low) 95% (63%-98%) 97% (94%-98%) 89% (63%-97%)
GM - R2D2 (Mid) 95% (64%-98%) 97% (94%-98%) 89% (64%-96%)
GM - R2D2 (Strong) 95% (66%-98%) 96% (94%-98%) 88% (66%-96%)
OVAT - Hierarchical 96% (87%-99%) 97% (95%-99%) 95% (87%-98%)
Population (full shrinkage) 89% (0%-95%) 94% (88%-95%) 67% (0%-89%)
Subgroup (no shrinkage) 95% (94%-96%) 95% (94%-96%) 95% (94%-96%)
Summary across 150 subgroups ( 25 subgroups × 6 scenarios) including 150 homogeneous and 0 heterogeneous subgroups.
Monte Carlo SE of estimated coverage ≈ 0.7% (1000 simulations, 95% true coverage).

Additional TTE-Specific Visualizations

This section provides extra exploratory plots for the Time-to-Event endpoint.

RMSE Distribution by Subgroup (TTE, Scenario 3)


RMSE Across All Scenarios (TTE)


Bias Across All Scenarios (TTE)


Bias vs RMSE Scatter (TTE)


Coverage by Heterogeneity Status (TTE)


Additional Binary Endpoint Visualizations

RMSE Distribution by Subgroup (Binary, Scenario 3)


RMSE Across All Scenarios (Binary)


Bias Across All Scenarios (Binary)


Bias vs RMSE Scatter (Binary)


Coverage by Heterogeneity Status (Binary)


Additional Count Endpoint Visualizations

RMSE Distribution by Subgroup (Count, Scenario 3)


RMSE Across All Scenarios (Count)


Bias Across All Scenarios (Count)


Bias vs RMSE Scatter (Count)


Coverage by Heterogeneity Status (Count)


Additional Continuous Endpoint Visualizations

RMSE Distribution by Subgroup (Continuous, Scenario 3)


RMSE Across All Scenarios (Continuous)


Bias Across All Scenarios (Continuous)


Bias vs RMSE Scatter (Continuous)


Coverage by Heterogeneity Status (Continuous)


Summary

This analysis presents comprehensive simulation results across all four endpoint types. Key findings:

  1. Standardized RMSE plots show relative performance of all estimators compared to the baseline subgroup estimator across six simulation scenarios.

  2. Performance tables quantify RMSE, bias, and coverage metrics, stratified by subgroup heterogeneity (homogeneous vs heterogeneous subgroups).

  3. TTE-specific visualizations provide additional insights into:

    • RMSE distribution across individual subgroups
    • Bias-RMSE trade-offs
    • Coverage performance by heterogeneity status

The results demonstrate the comparative performance of population, subgroup, Global (Horseshoe/R2D2), and OVAT approaches across different endpoint types and simulation scenarios.


Analysis Date: 2025-12-30

R Version: R version 4.4.3 (2025-02-28)